SBASS: Segment based approach for subsequence searches in sequence databases

نویسندگان

Sanghyun Park

Sang-Wook Kim

Wesley W. Chu

چکیده

The sequence database is a set of data sequences, each of which is an ordered list of elements [1]. Sequences of stock prices, money exchange rates, temperature data, product sales data, and company growth rates are the typical examples of sequence databases [2, 8]. Similarity search is an operation that finds sequences or subsequences whose changing patterns are similar to that of a given query sequence [1, 2, 8]. Similarity search is of growing importance in many new applications such as data mining and data warehousing [6, 17]. There have been many research efforts [1, 7, 8, 10, 17] for efficient similarity searches in sequence databases using the Euclidean distance as a similarity measure. However, recent techniques [13–15, 18] tend to favor the time warping distance for its higher accuracy and wider applicability at the expense of high computation cost. Time warping is a transformation that allows any sequence element to replicate itself as many times as needed without extra costs [18]. For → example, two sequences X = 〈20, 21, 21, 20, 20, 23, 23, 23〉 → and Q = 〈20, 20, 21,20, 23〉 can be identically transformed into 〈20, 20, 21, 21, 20, 20, 23, 23, 23〉 by time warping. The time warping distance is defined as the smallest distance between two sequences transformed by time warping. While the Euclidean distance can be used only when two sequences compared are of the same length, the time warping distance can be applied to any two sequences of arbitrary lengths. Therefore, the time warping distance fits well with the databases where sequences are of different lengths. The time warping distance can be applied to both whole sequence and subsequence searches. Let us first consider the

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segment - Based Approach for Subsequence Searches in SequenceDatabasesSanghyun

This paper deals with the subsequence searching problem under time-warping in sequence databases. Our work is motivated by the observation that subsequence searches slow down quadratically as the average length of data sequences increases. To resolve this problem, the Segment-Based Approach for Subsequence Searches (SBASS) is proposed. The SBASS divides data and query sequences into a series of...

متن کامل

Faster sequence homology searches by clustering subsequences

MOTIVATION Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. RESULTS We developed a fast homology search method based on database subsequence clusteri...

متن کامل

Alignment of BLAST High-scoring Segment Pairs Based on the Longest Increasing Subsequence Algorithm

MOTIVATION The popular BLAST algorithm is based on a local similarity search strategy, so its high-scoring segment pairs (HSPs) do not have global alignment information. When scientists use BLAST to search for a target protein or DNA sequence in a huge database like the human genome map, the existence of repeated fragments, homologues or pseudogenes in the genome often makes the BLAST result fi...

متن کامل

InterWeaver: interaction reports for discovering potential protein interaction partners with online evidence

InterWeaver is a web server for discovering potential protein interactions with online evidence automatically extracted from protein interaction databases, literature abstracts, domain fusion events and domain interactions. Given a new protein sequence, the server identifies potential interaction partners using two approaches. In the homology-based approach, the system performs sequence homolog...

متن کامل

Reference-Based Alignment in Large Sequence Databases

This paper introduces a novel method, called Reference-Based String Alignment (RBSA), that speeds up retrieval of optimal subsequence matches in large databases of sequences under the edit distance and the Smith-Waterman similarity measure. RBSA operates under the assumption that the optimal match deviates by only a relatively small amount from the query, an amount that does not exceed a prespe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Comput. Syst. Sci. Eng.

دوره 22 شماره

صفحات -

تاریخ انتشار 2007

SBASS: Segment based approach for subsequence searches in sequence databases

نویسندگان

چکیده

منابع مشابه

Segment - Based Approach for Subsequence Searches in SequenceDatabasesSanghyun

Faster sequence homology searches by clustering subsequences

Alignment of BLAST High-scoring Segment Pairs Based on the Longest Increasing Subsequence Algorithm

InterWeaver: interaction reports for discovering potential protein interaction partners with online evidence

Reference-Based Alignment in Large Sequence Databases

عنوان ژورنال:

اشتراک گذاری